Corelab Seminar
2010-2011
Paris Koutris (University of Washington, NTUA)
Parallel Evaluation of Conjunctive Queries
Abstract.
The availability of large data centers with tens of thousands of
servers has led to the popular adoption of massive parallelism for
data analysis on large data
sets. Several query languages exist for
running queries on massively parallel architectures, some based on
the MapReduce infrastructure, others using proprietary
implementations. Motivated by this trend, we analyze the
parallel complexity of conjunctive queries. We propose a very
simple model of parallel computation that captures the
searchitectures, in which the complexity parameter is the number of
parallel steps requiring synchronization of all servers. We study
the complexity of conjunctive queries and give a complete
characterization of the queries which can be computed in one
parallel step. These form a strict subset of hierarchical queries,
and include flat queries like R(x,y), S(x,z), T(x,v),U(x,w), tall queries
like R(x), S(x,y), T(x,y,z), U(x,y,z,w), and combinations thereof, which
we call tall-flat queries. We describe an algorithm for computing inparallel any tall-flat query,
and prove that any query that is not tall-flat
cannot be computed in one step in this model. Finally, we present
extensions of our results to queries that are not tall-flat.